DNN-based Speech Synthesis for Indian Languages from ASCII text

نویسندگان

Srikanth Ronanki

Siva Reddy Gangireddy

Bajibabu Bollepalli

Simon King

چکیده

Text-to-Speech synthesis in Indian languages has a seen lot of progress over the decade partly due to the annual Blizzard challenges. These systems assume the text to be written in Devanagari or Dravidian scripts which are nearly phonemic orthography scripts. However, the most common form of computer interaction among Indians is ASCII written transliterated text. Such text is generally noisy with many variations in spelling for the same word. In this paper we evaluate three approaches to synthesize speech from such noisy ASCII text: a naive UniGrapheme approach, a Multi-Grapheme approach, and a supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that. We train and test our models on Blizzard Challenge datasets that were transliterated to ASCII using crowdsourcing. Our experiments on Hindi, Tamil and Telugu demonstrate that our models generate speech of competetive quality from ASCII text compared to the speech synthesized from the native scripts. All the accompanying transliterated datasets are released for public access.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model hidden Markov model (GMMHMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work, we had proposed the use of signal processing cues in tan...

متن کامل

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis

In this paper, we propose to use hidden state vector obtained from recurrent neural network (RNN) as a context vector representation for deep neural network (DNN) based statistical parametric speech synthesis. While in a typical DNN based system, there is a hierarchy of text features from phone level to utterance level, they are usually in 1-hot-k encoded representation. Our hypothesis is that,...

متن کامل

Indian Language Screen Readers and Syllable Based Festival Text-to-Speech Synthesis System

This paper describes the integration of commonly used screen readers, namely, NVDA [NVDA 2011] and ORCA [ORCA 2011] with Text to Speech (TTS) systems for Indian languages. A participatory design approach was followed in the development of the integrated system to ensure that the expectations of visually challenged people are met. Given that India is a multilingual country (22 official languages...

متن کامل

Overview of NITECH HMM - based text - to - speech system for Blizzard Challenge 2014

This paper describes a hidden Markov model based text-tospeech (TTS) system developed at the Nagoya Institute of Technology (NITECH) for Blizzard Challenge 2014. The tasks of Blizzard Challenge 2014 are speech synthesis of six Indian languages and multilingual speech synthesis, i.e., Indian language and English. Only Indian language speech data and text are provided as training data. We focused...

متن کامل

A common attribute based unified HTS framework for speech synthesis in Indian languages

State-of-the art approaches to speech synthesis are unit selection based concatenative speech synthesis (USS) and hidden Markov model based Text to speech synthesis (HTS). The former is based on waveform concatenation of subword units, while the latter is based on generation of an optimal parameter sequence from subword HMMs. The quality of an HMM based synthesiser in the HTS framework, crucial...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

DNN-based Speech Synthesis for Indian Languages from ASCII text

نویسندگان

چکیده

منابع مشابه

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis

Indian Language Screen Readers and Syllable Based Festival Text-to-Speech Synthesis System

Overview of NITECH HMM - based text - to - speech system for Blizzard Challenge 2014

A common attribute based unified HTS framework for speech synthesis in Indian languages

عنوان ژورنال:

اشتراک گذاری